Genomic surveillance of SARS-CoV-2 in patients presenting neurological manifestations

During the first wave of infections, neurological symptoms in Coronavirus Disease 2019 (COVID-19) patients raised particular concern, suggesting that, in a subset of patients, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could invade and damage cells of the central nervous system (CNS). Indeed, up to date several in vitro and in vivo studies have shown the ability of SARS-CoV-2 to reach the CNS. Both viral and/or host related features could explain why this occurs only in certain individuals and not in all the infected population. The aim of the present study was to evaluate if onset of neurological manifestations in COVID-19 patients was related to specific viral genomic signatures. To this end, viral genome was extracted directly from nasopharyngeal swabs of selected SARS-CoV-2 positive patients presenting a spectrum of neurological symptoms related to COVID-19, ranging from anosmia/ageusia to more severe symptoms. By adopting a whole genome sequences approach, here we describe a panel of known as well as unknown mutations detected in the analyzed SARS-CoV-2 genomes. While some of the found mutations were already associated with an improved viral fitness, no common signatures were detected when comparing viral sequences belonging to specific groups of patients. In conclusion, our data support the notion that COVID-19 neurological manifestations are mainly linked to patient-specific features more than to virus genomic peculiarities.


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a member of the Coronaviridae family, a group of viruses that can infect both mammals and birds. SARS-CoV-2 has a probable zoonotic origin with bat as primary host [1,2]. Differently from endemic human stages of the disease [14,15], while the most common complications are stroke, neurological damage, ataxia, delirium, brain and spinal cord inflammation [13], encephalopathies and encephalitis [16,17]. Moreover, an increased incidence of Acute Disseminated EncephaloMyelitis (ADEM) has been reported in COVID-19 patients [18]. Different respiratory viruses, including Coronaviruses, are known to infect the brain tissue [19,20]. For example, neuroinvasion and neurovirulence of the human coronavirus OC43 (HCoV-OC43), SARS-CoV-1 and Middle East Respiratory Syndrome (MERS)-CoV were reported in transgenic mice [20,21]. Importantly, ACE2 was shown to be highly expressed in neurons, astrocytes, and oligodendrocytes [22,23]. The SARS-CoV-2 ability to infect the human brain has been further analyzed in human neural progenitor cells and brain organoids [24,25]. In addition, SARS-CoV-2 RNA was detected in brains [26] and in the cerebrospinal fluid (CSF) of some COVID-19 patients, indicating a possible pathophysiological involvement of the virus in the neurological symptoms [27]. The mechanisms accounting for SARS-CoV-2 related neurological manifestations have been deeply analyzed in many recent studies (reviewed in [11]). Up to now, two hypotheses have been proposed to explain the COVID-19 neuropathogenesis. On the one hand, SARS-CoV-2 may enter the CNS through the olfactory bulb by binding to the ACE-2 receptor, damaging the blood brain barrier (BBB) or even exploiting leukocytes as a Trojan horse mechanism [28]. The neurological symptoms would be, then, a consequence of the direct viral damage to brain tissues. In this context, specific regions of the viral genome could play a role in neurotropism and/or neurovirulence. On the other hand, SARS-CoV-2 replication in the lungs, in addition of causing pulmonary dysfunctions, is known to trigger an intense and dysregulated systemic inflammatory process called cytokine storm, which is a common feature of severe COVID-19 cases. The cytokine storm has consequences for the entire organism and might affect the CNS, even without a direct invasion of the brain by the virus [28,29]. Finally, viral entry into the CNS may trigger a localized inflammatory response that could have an impact on the CNS by itself, or in combination with the cytokine storm [11,29].
Here we investigated whether viral genomes obtained from selected patients displayed peculiar mutations associated to the degree of severity of the neurological symptoms. To avoid artifacts due to the high viral mutation rates in vitro, we avoided virus amplification in cell culture and directly sequenced the viral genomic RNA obtained from the nasopharyngeal swabs of the enrolled individuals. First, we focused our attention on two SARS-CoV-2 envelope proteins: i) the spike glycoprotein, which is responsible for the viral tropism and the viral entry into target cells, ii) the E protein, that is linked to the modulation of the inflammatory responses in several Coronaviruses. Second, a whole genome analysis was performed to detect mutations that might be related to neurovirulence. Overall our data indicate that no mutations can be found associated to specific neurological symptoms or to their severity, thus supporting the notion that CNS manifestations in COVID-19 patients are mainly linked to the individual inflammatory response, more than to peculiar viral features.

Study design
The study was approved by the Padua Hospital Research Ethics Committee (Protocol 056881). A group of patients displaying SARS-CoV-2 related neurological symptoms with different degrees of severity was selected, as shown in Table 2 (Results section). All enrolled subjects underwent Magnetic Resonance Imaging (MRI). Informed consent was obtained from all participants.
Nasopharyngeal swabs were handled, using standard procedure to inactivate the virus. Viral RNA was directly extracted by an automatic nucleic acids extractor (MagNA Pure by Roche), following the manufacturer's instructions. Obtained samples were coded with letters from A to G, according to chronological order of the extraction date. In particular, viral extracts from patients A, B, C, D, E and F were collected between March 4 and 27, while the viral RNA from patient G was extracted on November 23, 2020. Following analyses were conducted in blind, meaning that no information on symptoms were known until the overall sequencing results were analyzed.

Sanger sequencing
First, genes of interest (E and S) were retro-transcribed from total RNA viral extracts by the AgPath-IDT One-Step RT-PCR kit (Thermofisher), following the manufacturer's instructions. The adopted oligonucleotides sequences and position within the target are reported in Table 1 and in Fig 1, respectively. Briefly, for the E sequence adopted oligonucleotides mapped at the  where primers adopted for reverse-transcription, amplification and sequencing of the E (light blue) and S (red, yellow and green) genes map within the viral regions of interest. While a single pair of oligonucleotides were designed to amplify and sequence the entire E gene, one couple of external primers along with two additional pairs of internal primers were used in the case of the S gene. "E For" stands for primer E forward; "E Rev" for primer E reverse, "S For" for primer S forward; "S1 Rev" for primer S1 reverse; "S2 For" for primer S2 forward; "S2 Rev" for primer S2 reverse, "S3 For" for primer S3 forward; "S Rev" for primer S reverse. https://doi.org/10.1371/journal.pone.0270024.g001 3' and 5' ends of the gene; while three pairs of primers were designed to reverse-transcribe the large S gene into three fragments of similar length (Fig 1). Next, to obtain a sufficient amount of cDNA, an additional PCR amplification step was carried on by using the same primers described above along with the high-fidelity DNA-dependent polymerase Phusion Hot Start II High-Fidelity DNA Polymerase (Invitrogen). The PCR products were purified by NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel), and 6.4 pmoles were dried at 65˚C for a few minutes. Sanger sequencing of the PCR products was performed by BMR Genomics (Padua, Italy), by adopting the same pairs of oligonucleotides reported in Table 1

Whole genome sequencing
For the whole genome sequencing, an amplicon-based approach, targeting 343 partially overlapping subgenomic regions, that cover the entire SARS-CoV-2 genome, was used. Virus genomes were generated using Paragon Genomics' CleanPlex multiplex PCR Research and Surveillance Panel, according to the manufacturer's protocol [30,31]. Briefly, similar amounts of RNA were reverse transcribed with random primers and the resulting cDNA, magnetically purified, was used as template in a 10 μl-multiplex PCR performed with two pooled-primer mixtures. Samples were treated with 2 μl of CleanPlex digestion reagent at 37˚C for 10 min to remove non-specific PCR products. After magnetic bead purification, PCR products were subjected to further 25 rounds of amplification in a secondary PCR where indexed primers allow to generate the amplicons library. Subsequently, purified libraries were quantified with the Qubit DNA HS Assay Kit (Thermo Fisher Scientific). Amplicon libraries were loaded in a 300-cycle sequencing cartridge and deep sequencing was performed on MiSeq platform (Illumina, San Diego, CA, USA). Sequencing raw data were checked for quality using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc) and then analyzed with software SOPHiA GENETICS' SARS-CoV-2 Panel (SOPHiA GENETICS, Lausanne, Switzerland). To confirm data analyses, the paired-end reads were trimmed with Trimmomatic ver. 0.38 for quality (Q score > 25) and length (> 36 bp) and assayed by the Geneious 1 software (version 11.1.5) (Biomatters Ltd, New Zealand). The consensus sequence was reconstructed by mapping the reads to the SARS-CoV-2 reference sequence NC_045512 using Bowtie2 in sensitivelocal mode with consensus threshold at 65%. The variant calling was carried out by the Variant Finder Tool (Geneious) filtering out variants with a p-value greater than 0, using a minimum variant frequency of 0 and default parameters for Maximum Variant p-value (10 −6 ).

Sequence alignment and classification of viral sequences into SARS-CoV-2 lineages and clades
The pool of sequences obtained by NGS was aligned with those reported in the GISAID online database (https://www.gisaid.org/ first access on 15 th of March 2021). Specifically, mutation analysis was performed exploiting the CoVsurver application of GISAID (https://www.gisaid. org/epiflu-applications/covsurver-mutations-app/ last accessed on the 17 th of May 2022). Classification into clades was performed through Nextstrain (https://nextstrain.org/SARS-CoV-2/ accessed on the 15th of March 2021), while lineage assessment was conducted using the Phylogenetic Assignment of Named Global Outbreak LINeages (PANGOLIN) tool available at https://github.com/hCoV-2019/pangolin [32] (accessed on the 15 th of March 2021).

Phylogenetic analysis
Public SARS-CoV-2 complete genome sequences (>29 Kb), available in the GISAID repository until December 2021 were retrieved. Low-quality genomes and nearly identical sequences (genetic similarity > 99.99%) were excluded, obtaining a global dataset of 2463 public genomes plus 7 genomes reported in this study. The global datasets of 2470 whole genome sequences were aligned by MAFFT (FF-NS-2 algorithm) using default parameters [33]. The alignment was manually curated using Aliview [34] to remove artifacts at the ends. Phylogenetic analysis was performed using IQ-TREE (version 1.6.10) under the best fit model according to Bayesian Information Criterion (BIC) indicated by the Model Finder application implemented in IQ-TREE [35]. The statistical robustness of individual nodes was determined using 1000 bootstrap replicates.

The S and E proteins shared a conserved sequence among the SARS-CoV-2 genomes under study
Patients enrolled in this study were all SARS-CoV-2 positive subjects belonging to one of the following three groups based on their neurological symptoms: symptomatic, mild-symptomatic, asymptomatic. Patients were coded with letters from A to G, according to chronological order of viral RNA extraction date. In this manuscript, for clarity reasons, these capital letters are followed by the lower case letters "s", "m" or "a", representing the relative neurological symptomatology (symptomatic, mild-symptomatic, asymptomatic, respectively). In more details: • Patients As, Fs and Gs were severely symptomatic (s), displaying neurological signs already reported as associated to COVID-19, i.e. generalized seizures and encephalopathy. In addition to these common manifestations, patient Fs had also a stroke. As, Fs, Gs were hospitalized and displayed severe respiratory symptoms, requiring intubation.
• Patients Cm and Em were mildly symptomatic (m), showing smell deficit, that was full recovered only by patient Em.
• Patients Ba and Da were the negative controls, as they were SARS-CoV-2 positive but display neither neurological nor non-neurological symptoms [asymptomatic (a)].
None of the patients reported relevant comorbidities. Main information on the enrolled individuals is summarized in Table 2.
Total viral RNA was recovered directly from nasopharyngeal swabs of the selected patients, to avoid possible accumulation of mutations during SARS-CoV-2 amplification in cell culture. Next, the S and E envelope genes were Sanger sequenced. In the case of the S gene, all In conclusion, from viral genomes deriving from a specific group of patients no peculiar and common sequence signature were detected in the S or E genes, at least with the Sanger sequencing.

Whole genome sequencing highlighted several known mutations present in the viral extracts but none common and unique of SARS-CoV-2 genomes retrieved from neurological symptomatic patients
Despite the finding that all viral genomes under evaluation shared identical E and S genes, peculiar features associated with neurological symptoms could not be ruled out in other SARS-CoV-2 genomic regions. Thus, a whole-genome analysis by next generation sequencing was carried out starting from the same viral RNA extracts adopted for the E and S Sanger sequencing. Specifically, the Illumina MiSeq platform was applied on 343 partially overlapping subgenomic regions, obtained by an amplicon-based approach to cover the entire SARS-CoV-2 genome. Obtained sequences were aligned with the Wuhan-Hu-1 reference sequence (NCBI Reference Sequence: NC_045512.2). The Geneious Find Variations/SNPs program, used to evaluate the presence of mutations in our samples, finds variants above a minimum threshold to screen out disagreements due to read errors. The tool calculates p-values for variations and filters only the single-nucleotide variations (SNVs) with a specified maximum P-Value. The lower the p-value, the more likely the variation at the given position represents a real variant. For this reason, we decided to filter out variants with a maximum p value of 10 −6 , with 0.001% probability to see variant by chance. We selected stringent parameters, together with a minimum coverage of 10 reads, to have the possibility to find all the SNVs present in patient's samples. Nevertheless, almost all of the identified mutations have a depth of more than 100 reads with a variant frequency >70%.
Found mutations are reported in Table 3. Among the most significant ones, 4 mutations (genomic positions 241, 3037, 14408 and 23403 of the RNA letter code), characteristic of lineage B.1, were found in all the analyzed viral genomes. Another set was shared between the viral extract obtained from 1 mild symptomatic patient (Cm) and 1 asymptomatic subject (Ba). This was a variation of 3 nucleotides at position 28881-2-3, which is a main feature of most sequences belonging to Lineage B.1.1.
Among patients characterized by the most severe clinical situation, the viral RNA extracted from patients Fs and Gs shared C28932T substitution in the RNA sequence. This mutation causes the change of an alanine (A) to an amino acid of similar properties, valine (V), in position 220 of the amino acidic sequence (A220V). The A220V non-synonymous mutation is located within the viral nucleocapsid N protein. The alignment with sequences present in the Global Initiative on Sharing All Influenza Data (GISAID) repository until the 17 th of May 2022 showed that the A220V mutation has a frequency of roughly 1.6% (Table 4). A220V has been reported in viral genomes from 113 Countries, with the first one reported in February 2020 in Morocco, and predominant in Spain during summer 2020. Interestingly, A220V displays a distribution pattern similar to the one of the A222V of the S glycoprotein [37]. In fact, both mutations are generally associated in the 20A.EU1 clade. However, both Fs and Gs viral extracts lack the A222V change. Interestingly, while the Fs nasopharyngeal swab was performed in early March 2020, the Gs was obtained in November of the same year. Of note, the third severely symptomatic patient (As) lacks the A220V polymorphism.  T  T  T  T  T  T  T   626  G  T  ORF1ab  leader  Non-syn  WT  V121F  WT  WT  WT  WT  WT   3037  C  T  ORF1ab  nsp3  Syn  F924  F924  F924  F924  F924  F924  In addition to known mutations, viral genomes obtained from patients As (symptomatic), Ba (asymptomatic) and Cm (mild-symptomatic) displayed "extra" polymorphisms that are less studied. When sequences were aligned with those reported in the online GISAID database until the 17 th of May 2022 (Table 4) these extra polymorphisms were characterized, in general, by a frequency below 1%, with the nucleotide substitution G25459T (patient Ba), approaching 0.1%, and the T11652C (patient As), approaching 0%. Among the mutations reported in Table 4 the most interesting are the following: • C25433T in ORF3a of viral extract As: in the amino acidic code this mutation is translated into T14I, with a reported frequency of around 0.08%. This mutation was shared by 7961 sequences present in the GISAID database (until May 2022), reported in 90 Countries with the first sequence dating back March 2020. This substitution changes a polar amino acid, threonine (T), to a non-polar one, isoleucine (I).
• T29568C in ORF10 of viral extract As: it leads to an amino acidic substitution from a nonpolar amino acid, isoleucine (I), to a polar one, threonine (T), i.e. I4T. GISAID analysis (May 2022) did not find sequences with such a mutation reported in the database. However, it is likely that, in a genome 29.9 kb-long, a mutation located in position 29600 is in a noncoding region.
• G25459T in ORF3a of viral extract Ba: it is translated in an amino acid change from a nonpolar amino acid, alanine (A) to a polar one, serine (S), and named A23S. It has a frequency of 0.11% and has been reported in 11614 sequences of 99 Countries (GISAID May 2022), with the first sequence deposited in February 2020 in Netherlands.
• T18417C in ORF1ab of viral extract As: it is the only synonymous mutation reported in patient A. This mutation is located in the nsp14, the protein responsible for the proof-reading activity of the polymerase.
It must be noted that there were no sequences reported in the online database with 100% similarity to the sequences obtained from patients As, Ba and Cm, at least up to May 2022.

Classification of the identified SARS-CoV-2 genomes into clades and lineages and phylogenetic analysis
Based on the NGS analysis, viral sequences were classified in their corresponding lineages by adopting the Phylogenetic Assignment of Named Global Outbreak LINeages (PANGOLIN) tool. In more details, the results showed that viral sequences from patients As, Da and Em belonged to B.1 lineage, while the ones from patients Fs and Gs belonged to lineage B.1.177. In addition, the viral sequences from patients Cm and Ba were classified into lineage B.1.1.
Furthermore, a clade classification by Nextstrain was also obtained. The results showed that the viral sequences from patients As, Da, Em, Fs and Gs were all associated to clade 20A, while Ba and Cm belonged to clade 20B. The mutation shared between patients Fs and Gs was observed in both a branch of clade 20B as well as in clade 20E.EU1. However, it must be noted that the latter clade is defined by the presence of A222V amino acidic variation in the spike protein, that, as already mentioned, was absent in these patients. In conclusion, there was no association between a specific viral lineage/clade and a specific patient group.
Finally, in order to accurately assess possible evolutionary relationships on a global scale among these seven Italian SARS-CoV-2 sequences, a maximum likelihood (ML) tree was  the severely symptomatic ones, gave rise to the B.1.177 cluster, while the remaining sequences from the other patients were located in accordance with their lineage but are scattered across the phylogenetic tree. Interestingly, phylogenetic analysis revealed that the GISAID sequences EPI_ISL_539548 and EPI_ISL_3716577, located at the beginning of B.1.177 branch before patients Fs and Gs, derived from symptomatic and hospitalized patients.

Discussion
COVID-19 is mainly a respiratory syndrome. However, since the beginning of the disease in a subset of individuals extra-respiratory symptoms/complications could arise. Among those, neurological manifestations, ranging from olfactory dysfunction to any neurologic manifestation, with the most common being encephalopathy, have been of great concern since the beginning of the pandemic [11,38]. The mechanisms accounting for the onset of neurological symptoms/complications in certain COVID-19 patients is still matter of study, especially in terms of the role directly played by the virus [11,28,38,39].
The aim of this study was to analyze whether SARS-CoV-2 genomes directly retrieved from the nasopharyngeal swabs of patients characterized by severe CNS manifestations displayed specific and common signatures not shared with the ones obtained from asymptomatic/mild symptomatic subjects. In particular, seven SARS-CoV-2 positive individuals were enrolled between March and November 2020. These patients were either fully asymptomatic or experienced variable neurological manifestations, including three subjects admitted to the intensive care unit of the Padua Hospital due to severe encephalopathy. In this cohort of patients, four were between 25 and 35 years old, one around 55 and two above 70 years old. Neurological symptoms and complications displayed by the selected subjects are reported in the literature among the most frequent COVID19-related neurological manifestations [13,39]. All patients in the most severe neurological conditions presented important respiratory symptoms, such as interstitial pneumonia, and were intubated. This finding supports the notion that patients with severe disease are more likely to develop neurological disorders [40,41]. Finally, and not surprisingly [42,43], the three hospitalized patients were also the oldest of this cohort and two out of three were males.
Data demonstrate that SARS-CoV-2 can access the CNS and can potentially damage the brain tissues by directly lysing infected cells and by triggering a localized inflammatory response [11,28]. On the other hand, SARS-CoV-2 replication in the lungs stimulates a cytokine storm that could also affect the CNS, even in the absence of a direct brain invasion by the virus [28,29]. The two SARS-CoV-2 envelope proteins S and E could play a role in both these mechanisms. Indeed, while the S protein is the main viral tropism determinant [10,11], the E protein is involved in different steps of viral replication and in regulating viral/cell host interactions [44]. Studies focused on SARS-CoV-1, in particular, have shown that E protein is a virulence factor involved in the activation of various inflammatory pathways [45]. Of note, the E genes of SARS-CoV-1 and SARS-CoV-2 are highly identical. Based on these evidence, we started by Sanger sequencing the S and E genes of SARS-CoV-2 genomes directly extracted from the nasopharyngeal swabs of the enrolled cohort. However, with the exception of the well-known and expected D614G mutation [36], present in the S gene and common to all the analyzed sequences, no other mutations were detected in either the S or E genes.
Next, the same viral RNA extracts underwent whole genome analysis, obtaining a classification of the SARS-CoV-2s under investigation into three different lineages: B.1, B.1.177 and B1.1. More specifically, viral sequences obtained from patients As, Da and Em were linked to the B.1 lineage, which has been circulating since mid-January 2020, and is characterized by 4 predominant mutations. These mutations include D614G and P323L that have been observed in all the analyzed patients. The B.1 lineage was the most prevalent lineage in Europe during 2020 (Pangolin Cov-lineages: https://cov-lineages.org/resources/pangolin.html accessed on the 15 th of March 2021). The viral sequences from patients Fs and Gs were linked to the B.1.177 lineage, which is a sub-lineage of the B.1 lineage. These sequences shared the same 4 mutations as before, but with the addition of A220V in the nucleocapsid N protein. The B.1.177 lineage had spread mainly in Europe during summer 2020 and is slowly disappearing (Pangolin Cov-lineages: https://cov-lineages.org/resources/pangolin.html).
The viral genomes sequenced from patients Cm and Ba were associated to the B.1.1 lineage, which derives from lineage B.1. These viral sequences were characterized by the additional presence of 3 consecutive SNVs at position 28881-2-3. This lineage was also prevalent in Europe, specifically in England, circulating since mid-February (Pangolin Cov-lineages: https://cov-lineages.org/resources/pangolin.html). The Nextstrain classification into clades showed that the viral sequences from patients As and Da, together with patients Em, Fs and Gs were all classified into clade 20A, while those from patients Ba and Cm were classified into clade 20B. The 4 most common mutations shared by the B.1, B.1.177, B.1.1 lineages were 4 SNVs: C241T, C3037T, C14408T, A23403G. These 4 SNVs that have been widely described in literature [46,47], appeared when the virus started circulating outside of China, and are widely distributed in Europe and in the USA. Indeed, since the beginning of March 2020, when, according to the databases, these mutations were present in 10% of the analyzed sequences, their frequency increased exponentially, reaching 78% by the end of May 2020 [48] and overall constituted the dominant haplotype in Europe during year 2020. Indeed, these mutations, which are located far apart from each other in the genome, have a strong allelic association, likely due to different factors, as a founder effect or a gain in viral fitness [49].
C241T is a non-coding mutation located in the 5' UTR, and even if it does not cause changes in the amino acid sequence, it could still lead to a relevant change in the secondary structure of the RNA or/and modify the repertoire of interactions with viral and cellular proteins, thus affecting the RNA replication or the speed of the infection cycle [28]. C3037T is a single nucleotide substitution in position 3037 of the RNA genome, located in the nsp3 gene. It is a silent, synonymous, mutation named F924; it is located in the nsp3 protein, which is a phosphodiesterase. Although silent, this mutation may affect the RNA secondary structure, and therefore the interaction with other proteins [36].
C14408T is a non-synonymous mutation in the nsp12, which is the RNA-dependent RNApolymerase [50]. This single nucleotide variation leads to an amino acid substitution from a proline (P) to a leucine (L), in position 323 of the nsp12, which in turn is a P4715L substitution in the ORF1ab. This mutation is located outside the catalytic site, specifically at the interface domain of the protein, that is responsible for the interaction with other proteins [51]. According to literature, this mutation could have multiple effects. Not only it could cause an increased interaction with the SARS-CoV-2 non-structural protein nsp8, which is a processivity factor; but it could also alter the interaction with the viral nsp14, an exonuclease responsible for the proof-reading activity of the polymerase. This way, it could affect the fidelity of replication and be responsible for a higher mutational rate [50]. Furthermore, given its hydrophobicity, leucine tends to remain inside the secondary structure, rather than being exposed to the outside, as proline does; thus, the substitution could cause the loss of one of the nsp12 epitopes leading to antibodies escape [52].
A23403G single nucleotide variation is in the spike-encoding gene. This SNV causes an amino acid change from a hydrophilic and negatively charged aspartic acid (D) to hydrophobic and uncharged glycine (G), therefore it is called D614G. In literature, this mutation is reported to be associated with increased virus infectivity and transmissibility [48,53]. However, the reason for this increase remains to be answered. On the one hand, it is thought to be related to an increased affinity for the angiotensin converting enzyme 2 (ACE2) receptor which, in turns, increases the entry efficiency of the virus [54]. On the other hand, it seems to be related to a more efficient incorporation of the spike in the newly formed virions, due to its greater stability and reduced shedding of the S1 subunit upon cleavage [55]. It must be noted that, overall, this set of four mutations was not linked to either a greater disease severity, in terms of hospitalization rate, nor to a reduced antibody neutralization [48].
Viral sequences from the asymptomatic patient Ba and the mild symptomatic patient Cm were characterized not only by the four mutations described above, but also by a variation of three nucleotides at position 28881-2-3 [56]. This three-nucleotide variation from GGG to AAC is located within the gene encoding for the viral nucleocapsid N protein. These three SNVs have a strong allelic association and lead to an amino acid change at position 50-51 from arginine-glycine to lysin-arginine, thus bringing together two basic and polar amino acids, which could create a stronger binding between the N protein and the viral RNA. These mutations are expected to affect viral infectivity, due to the replacement of an arginine-glycine by a lysine-arginine in a serine-rich motif of the nucleoprotein [57]. This triplet characterizes most of the sequences deposited in GISAID belonging to the B.1.1 lineage first reported in late February 2020. This lineage was predominant until June 2020, when it started rapidly decreasing in its frequency [37]. Indeed, there was a reversion of this mutated triplet into the original one, and a parallel spread of a new variant characterized by 2 spike mutations associated with an A220V mutation of the nucleocapsid N protein. However, the mutated triplet reappeared in the most recent variants, and it can be found also in sequences belonging to the Omicron variant (https://www.gisaid.org/ accessed on May 17 th 2022).
Of those patients with the most severe symptoms, the viral RNA obtained from patients Fs and Gs showed the A220V amino acid change of the N protein. This mutation has a similar distribution pattern to the A222V spike mutation; these two mutations are generally associated and define the 20A.EU1 clade. This variant dated back to summer 2020 when it appeared in the Netherlands and then in Spain and spread throughout Europe, and with a lower frequency in the rest of the world [37,56]. The A220V in N protein propagated rapidly, associated with two spike mutations (L18F and A222V), in the second wave of the pandemic. However, in our cohort, these two spike mutations were absent in all analyzed patients. On the one hand, the A220V variation was only present in two severely symptomatic patients and neither in the controls, nor in the mild symptomatic patients. On the other hand, the A220V mutation was not shared by the third severely symptomatic patient As. Of note, the same mutation was found in two GISAID sequences (EPI_ISL_539548 and EPI_ISL_3716577) obtained from symptomatic patients.
The symptomatic patient As, asymptomatic Ba and mild-symptomatic Cm had unique mutations in their viral sequence. The alignment with sequences in the database showed that all these mutations were rare, with a frequency below 1% and none of them was phenotypically characterized. Therefore, if the frequency of these non-synonymous mutations will increase, it could be important to evaluate whether they may affect the corresponding protein function. On the other side, none of these "novel" mutations was shared between the three patients.
In conclusion, while interesting mutations were found in the viral genome under study, from the overall analysis no correlation is evident between symptoms and viral sequence features. Furthermore, the MRI performed on the patients excludes the possibility of a brain localized inflammatory response. In fact, the MRI of each patient was always negative, with exception of one patient (Fs), where there was a concomitant ischemic episode. Hence, overall, our data support the notion that COVID-19 neurological manifestations are linked to the individual inflammatory response, rather than to specific signature of the viral genome [58]. The main limit of this pilot study is surely the small number of enrolled patients. However, on one hand, and as mentioned before, these data are worth of attention, as they have been obtained by whole genome sequencing performed on viral RNA directly extracted from the nasopharyngeal swabs of the enrolled patients, without viral amplification in cell culture. This procedure is recommended by the WHO to avoid accumulation of extra mutations not reflecting the in vivo situation [59]. On the other hand, the sampled population is heterogeneous in terms, for instance, of age and sex. Indeed, despite the limited number of patients, we found an association between neurological and severe respiratory symptoms and between symptom severity in general, and age and sex, as reported in the literature [42,60]. Thus, the reported results are worth of attention and deserve further validation in larger cohorts of patients.